## Aug 25, 2022 | RISC-V Perf Analysis SIG Meeting

Attendees: Beeman Strong tech.meetings@riscv.org LIU Zhiwei

## Notes

- Attendees: Atish, Philipp, Anup, Zhiwei, Zhao M, Beeman, Greg F
- Slides and video <u>here</u>
- Atish reviewing PMU SBI enhancements
- Describing number of SBI ECALLs and counter reads (which are traps for KVM), for perf record and perf stat usages
- Greg: Is it typical to have multiple counters configured to interrupt on overflow?
  - o Beeman: It gets used at least, have heard this from customers. Not sure if typical
  - o Philipp: common to sample on both retire and spec insts at once, in ARM world
  - Atish: perf record takes overflow on interrupt for all selected events. No way to combine perf stat and perf record, where one counter is configured for interrupts and the others simply run and are snapshotted when sampling on overflows.
  - Greg: seems typical to want to collect, say, the number of cycles that pass between every 1000 cache misses. Sounds like that's not possible here.
  - AI: Atish to follow-up and determine whether it's possible to monitor events in addition to those configured for interrupts
- Why can't KVM allow VM reads without trapping?
  - Because KVM does counter remapping
- Have to setup shared memory at VM boot time?
  - Yes, part of discovery, else use old interface
- Potential downside to snapshot use is that shared memory region may not be in cache, in which case the first access will be slow (but subsequent accesses should hit L1)
  - So if using perf stat with one counter, snapshot is probably slower than legacy SBI interface, since LLC/DRAM access is likely slower than a CSR read. But as number of counters increases the advantages of snapshot quickly overtake legacy ifc.
  - Cache is not evicted on context switch, though L1 and maybe L2 will likely have much evicted by the time your context returns
  - Sampling rate may dictate likelihood of shared memory remaining in say L3 vs DRAM
- Today perf stat does counter start/stop per counter, seems inefficient. Even without snapshot can do bulk start/stop, and with snapshot can also do bulk reads.
- Next meeting: brainstorm on filling the branch history gap (ala LBRs, BRBEs, ...)

## Action items

|  | Atish Kumar Patra               | - Aug 25, 2022 - check on how to read multiple counters in p | perf |
|--|---------------------------------|--------------------------------------------------------------|------|
|  | when taking an interrupt on one |                                                              |      |

| Beeman Strong | - Jul 28, 2022 - Reach out about proprietary performance analysis |
|---------------|-------------------------------------------------------------------|
| tools         |                                                                   |
| Beeman Strong | - Jul 28, 2022 - Reach out to VMware about PMU enabling           |
| Beeman Strong | - Jul 28, 2022 - Talk to security HC about counter delegation     |
|               |                                                                   |